adjoint method
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > New York (0.04)
Do Residual Neural Networks discretize Neural Ordinary Differential Equations?
Neural Ordinary Differential Equations (Neural ODEs) are the continuous analog of Residual Neural Networks (ResNets). We investigate whether the discrete dynamics defined by a ResNet are close to the continuous one of a Neural ODE. We first quantify the distance between the ResNet's hidden state trajectory and the solution of its corresponding Neural ODE. Our bound is tight and, on the negative side, does not go to $0$ with depth $N$ if the residual functions are not smooth with depth. On the positive side, we show that this smoothness is preserved by gradient descent for a ResNet with linear residual functions and small enough initial loss. It ensures an implicit regularization towards a limit Neural ODE at rate $\frac1N$, uniformly with depth and optimization time.
Symplectic Adjoint Method for Exact Gradient of Neural ODE with Minimal Memory
A neural network model of a differential equation, namely neural ODE, has enabled the learning of continuous-time dynamical systems and probabilistic distributions with high accuracy. The neural ODE uses the same network repeatedly during a numerical integration. The memory consumption of the backpropagation algorithm is proportional to the number of uses times the network size. This is true even if a checkpointing scheme divides the computation graph into sub-graphs.
- Europe > United Kingdom > North Sea > Southern North Sea (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- North America > Mexico > Gulf of Mexico (0.04)
- Africa > Mali (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- North America > Mexico > Gulf of Mexico (0.04)
- Africa > Mali (0.04)
Supplementary Material: Appendices A Derivation of Variational System Let us consider a perturbed initial condition
Suppose that the solution x (t) satisfies x ( t) = x (t) + δ (t). Proof of Remark 4: Eq. (6) can be rewritten as λ The variational and adjoint variables are augmented in the same way. By definition, the naive backpropagation algorithm, baseline scheme, ACA, and the proposed symplectic adjoint method provide the exact gradient up to rounding error. The memory consumption and computation time depend highly on the implementations and devices. Being implemented on a GPU, the convolution operation can be easily parallelized in space and exhibits a non-deterministic behavior.
- Asia > Russia (0.14)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)